Identification of metabolites from complex mixtures by 3D correlation of 1H NMR, MS and LC data using the SCORE-metabolite-ID approach

Not only in metabolomics studies, but also in natural product chemistry, reliable identification of metabolites usually requires laborious steps of isolation and purification and remains a bottleneck in many studies. Direct metabolite identification from a complex mixture without individual isolation is therefore a preferred approach, but due to the large number of metabolites present in natural products, this approach is often hampered by signal overlap in the respective 1H NMR spectra. This paper presents a method for the three-dimensional mathematical correlation of NMR with MS data over the third dimension of the time course of a chromatographic fractionation. The MATLAB application SCORE-metabolite-ID (Semi-automatic COrrelation analysis for REliable metabolite IDentification) provides semi-automatic detection of correlated NMR and MS data, allowing NMR signals to be related to associated mass-to-charge ratios from ESI mass spectra. This approach enables fast and reliable dereplication of known metabolites and facilitates the dynamic analysis for the identification of unknown compounds in any complex mixture. The strategy was validated using an artificial mixture and further tested on a polar extract of a pine nut sample. Straightforward identification of 40 metabolites could be shown, including the identification of β-d-glucopyranosyl-1-N-indole-3-acetyl-N-l-aspartic acid (1) and Nα-(2-hydroxy-2-carboxymethylsuccinyl)-l-arginine (2), the latter being identified in a food sample for the first time.


Table of Contents
− Table S1 Preparation of the samples for the artificial mixture S-III

− Table S2
Correlation coefficients of EDCs of L-leucine and L-isoleucine in the artificial mixture

S-IV
− Table S3 Comprehensive list of all identified metabolites in the polar extract of the pine nut sample including the respective correlation coefficients S-V

− Table S4
Semi-automatic detection of highly correlating EMCs to specific EDC of (1)

S-XII
− Table S5 Correlation coefficients between all EDCs of (1) S-XIII − Table S6 Correlation coefficients between EDCs and EMCs of (1) S-XIV

− Table S7
Semi-automatic detection of highly correlating EMCs to specific EDC of (2)

S-XV
− Table S8 Correlation coefficients between EDCs and EMCs of ( 2

S-III
Table S1: Amount of stock solution [µL] added to each sample, i.e. fraction, for the preparation of the artificial mixture.Stock solutions contain the pure compound each dissolved in water with a final concentration of 0.1 M.

List of further identified metabolites
In total, 40 metabolites were identified in the polar extract of pine nuts.The following table S3 shows the calculated correlation coefficients of 38 of the metabolites.Correlation coefficients of β-D-glucopyranosyl-1-N-         The spectrum was acquired with 2048 data points in F2 and 256 increments.32 dummy scans and 128 scans were acquired.The HMBC spectrum was acquired using the hmbcetgpl3nd pulse sequence with 4096 data points in F2 and 256 increments.16 dummy scans and 256 scans were acquired.

Figure S1 :
Figure S1: DI-MS spectra in positive ionization mode of the artificial mixture.

Figure S2 :
Figure S2: Selective TOCSY experiments of fraction 55 of the polar pine nut extract.The NMR spectrum of the whole fraction 55 is shown in the lower part.The irradiated frequency for selective excitation in the two upper spectra is marked by an asterisk.The spectrum in the middle shows signals of the spin system of D-glucose.The upper spectrum shows signal of an ABX spin system, which could be identified as L-aspartic acid.The selective TOCSY experiments were acquired using the seldigpzs pulse sequence with 512 scans, 4 dummy scans and 32 768 data points.

Figure S3 :
Figure S3: Part of the 2D HSQC spectrum (red and blue peaks) and 2D HMBC spectrum (black peaks) of fraction 55 of polar pine nut extract containing IAA-Asp-N-Glc (1).The correlation peaks between H-1' (5.63 ppm, anomeric proton of glucose) to C-2 (127.2 ppm) and C-7a (139.4 ppm), which confirm the glucose-N-indole bond, are highlighted green.For the acquisition of the HSQC spectrum the hsqcedetgpsp.3pulse sequence was used.

Figure S4 :
Figure S4: 1 H NMR spectra of fraction 55 of polar pine nut extract containing IAA-Asp-N-Glc (1) in phosphate buffer in D2O (lower spectrum) and in H2O/D2O (9:1) (upper spectrum).An additional doublet at 7.97 ppm appears in NMR spectrum acquired in H2O/D2O.Furthermore, multiplicity of signal at 4.43 ppm changes from dd in D2O to ddd in H2O/D2O because of the additional coupling to amide proton at 7.97 ppm.

Figure S5 :
Figure S5: Selective TOCSY experiments of fraction 67 of the polar pine nut extract.The lowest spectrum shows the 1 H NMR spectrum of the whole fraction 67.Irradiated frequencies for selective excitation of the TOCSY experiments (upper three spectra) are marked by an asterisk.The three spin systems of the three selective TOCSY experiments show high correlation coefficients over a series of fractions.The compound was identified as condensation product of citric acid and L-arginine, N α -(2-hydroxy-2-carboxymethylsuccinyl)-L-arginine (2).The selective TOCSY spectra were acquired using the seldigpzs pulse sequence.For all three spectra, 4 dummy scans, 512 scans and 32 768 data points were acquired.

Figure S6 :
Figure S6: 2D HSQC spectrum of fraction 68 of the polar pine nut extract containing N α -(2-hydroxy-2carboxymethylsuccinyl)-L-arginine (2).The HSQC spectrum was acquired using the hsqcedetgpsisp2.2pulse sequence on the Bruker Avance NEO 600 MHz NMR Spectrometer using TopSpin 4.1.3,equipped with a 5 mm TCI Cryoprobe cooled with liquid nitrogen.The spectrum was acquired with 2048 data points in F2 dimension and 128 increments using nonuniform sampling with NUSAmount of 50%.After acquisition of 16 dummy scans, 384 scans were acquired.

Figure S7 :
Figure S7: 2D HMBC spectrum of fraction 68 of the polar pine nut extract containing N α -(2-hydroxy-2carboxymethylsuccinyl)-L-arginine (2).The HMBC spectrum was acquired using the hmbcetgpl3nd pulse sequence on the Bruker Avance NEO 600 MHz NMR Spectrometer using TopSpin 4.1.3,equipped with a 5 mm TCI Cryoprobe cooled with liquid nitrogen.The spectrum was acquired with 4096 data points in F2 dimension and 256 increments using nonuniform sampling with NUSAmount of 50%.After acquisition of 32 dummy scans, 512 scans were acquired.The enlarged signal shows the correlation of H-2' to C-1' with a signal-to-noise ratio of 3.3.

Table S2 :
Pearson Correlation Coefficients calculated between NMR signals of L-Leucine and L-Isoleucine in the artificial mixture over all 26 fractions.Coefficients > 0.95 are highlighted green.

Table S3 :
List of further identified metabolites in the polar extract of pine nuts.The left column lists the metabolites together with indication of the exact mass as well as the range of fractions over which the correlation coefficients were calculated.The right column each contains tables of correlation coefficients between specific EDCs and EMCs or specific EDCs with each other.Correlation coefficients > 0.95 are highlighted green.

Table S4 :
Eleven most intense EMC signals in positive ionization mode that show correlation coefficient > 0.96 to EDC at 7.485 ppm of (1) in fractions 51-65 of the polar pine nut extract including the peak assignment.

Table S5 :
Pearson Correlation Coefficients of several EDCs of (1) in fractions 50-65 of the polar pine nut extract.

Table S6 :
Pearson correlation coefficients of EDCs and EMCs in positive ionization mode of compound (1) calculated in fractions 51-65.Coefficients > 0.95 are highlighted green.

Table S7 :
Five most intense EMC signals in positive ionization mode that show correlation coefficient > 0.96 to EDC at 2.456 ppm of (2) in fractions 60-73 of the polar pine nut extract including the peak assignment.

Table S8 :
Pearson correlation coefficients of EDCs and EMCs in positive and negative ionization mode of compound (2) calculated in fractions 60-73.Coefficients > 0.95 are highlighted green.